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ABSTRACT 

Uncoupling algorithms transform a linear differential system 
of first order into one or several scalar differential equations. 
We examine two approaches to uncoupling: the cyclic-vector 
method (CVM) and the Danilevski-Barkatou-Zurcher algo- 
rithm (DBZ). We give tight size bounds on the scalar equa- 
tions produced by CVM, and design a fast variant of CVM 
whose complexity is quasi-optimal with respect to the out- 
put size. We exhibit a strong structural link between CVM 
and DBZ enabling to show that, in the generic case, DBZ has 
polynomial complexity and that it produces a single equa- 
tion, strongly related to the output of CVM. We prove that 
algorithm DBZ is slower than CVM by at least two orders of 
magnitude, and provide experimental results that validate 
the theoretical complexity analyses. 

Categories and Subject Descriptors: 
1.1.2 [Computing Methodologies]: Symbolic and Alge- 
braic Manipulations — Algebraic Algorithms 

General Terms: Algorithms, Theory. 

Keywords: gauge equivalence, uncoupling, cyclic-vector 
method, complexity. 

1. INTRODUCTION 
1.1 Motivation 

Uncoupling is the transformation of a linear differential 
system of first order, Y' = MY, for a square matrix M 
with coefficients in a rational-function field K(X) of charac- 
teristic zero, into one or several scalar differential equations 

y ln) = c n _iy (n_1) H V c y, with coefficients a in K(X). 

This change of representation makes it possible to apply al- 
gorithms that input scalar differential equations to systems. 

In the present article, we examine two uncoupling algo- 
rithms: the cyclic-vector method (CVM) [25, 10, 8] and the 
Danilevski-Barkatou-Zurcher algorithm (DBZ) [13, 5, 32, 6]. 
While CVM always outputs only one equivalent differential 
equation, DBZ can decompose the system into several differ- 
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ential equations. This makes us consider two situations: the 
generic case corresponds to situations in which DBZ does 
not split the system into uncoupled equations, whereas in 
the general case, several equations can be output. 

For some applications, getting several differential equa- 
tions is more desirable than a single one. Besides, although 
the complexity of CVM is rarely discussed, its output is said 
to be "very complicated" in comparison to other uncoupling 
methods [19, 5, 32, 2, 17]. For these reasons, CVM has had 
bad reputation. Because of this general belief, uncoupling 
algorithms have not yet been studied from the complexity 
viewpoint. The lack and need of such an analysis is however 
striking when one considers, for instance, statements like 
We tried to avoid [...] cyclic vectors, because we do not have 
sufficiently strong statements about their size or complexity 
in a recent work on Jacobson forms [18]. One of our goals is 
a first step towards filling this gap and rehabilitating CVM. 

1.2 Contribution 

In relation to the differential system Y' = MY, a clas- 
sical tool of differential algebra is the map S, defined at a 
matrix or a row vector u by S(u) = uM + u'. A common 
objective of both CVM and DBZ, explicitly for the former 
and implicitly for the latter, as we shall prove in the present 
article, is to discover a basis P of K(A) n with respect to 
which the matrix of the application S is very simple. This 
matrix is the matrix P[M] defined in Section 1.5. In con- 
trast with CVM, which operates only on P, DBZ operates 
only on P[M] by performing pivot manipulations without 
considering P. An important part of our contribution is to 
provide an algebraic interpretation of the operations in DBZ 
in terms of transformations of the basis P. 

More specifically, we analyse the degree of the outputs 
from CVM and DBZ, so as to compare them. Interestingly 
enough, an immediate degree analysis of (the first, generi- 
cally dominating part of) DBZ provides us with a pessimistic 
exponential growth. We prove that this estimate is far from 
tight: the degree growth is in fact only quadratic. Surpris- 
ing simplifications between numerators and denominators 
explain the result. This leads to the first complexity analysis 
of uncoupling algorithms. It appears that, in contradiction 
to the well-established belief, DBZ and CVM have the same 
output on generic input systems. 

With respect to complexity, another surprising contribu- 
tion of the present work is that both CVM and DBZ (in 
the generic case) have polynomial complexity. Combining 
results, we design a fast variant of CVM. Even more surpris- 
ingly, it turns out that this fast CVM has better complex- 
ity (~ n e+1 d) than DBZ (~ n 5 d), when applied to systems of 



size n and degree d. As the output size is proved to be gener- 
ically asymptotically proportional to n z d, our improvement 
of CVM is quasi-optimal with respect to the output size. 
This all is well corroborated by experimental results. 

Another uncoupling algorithm is part of the work in [1]. 
We finally briefly analyse in Section 3.5 its complexity to 
find the same complexity bound as for DBZ. 

1.3 Previous work 

Uncoupling techniques have been known for a long time; 
CVM can be traced back at least to Schlesinger [25, p. 156- 
160]. Loewy [21, 22] was seemingly the first to prove that 
every square matrix over an ordinary differential field of 
characteristic zero is gauge-similar to a companion matrix. 

That a linear system of first order is equivalent to several 
scalar linear equations is a consequence of Dickson's [15, 
p. 173-174] and Wedderburn's [31, Th. 10.1] algorithmic 
results on the Smith-like form of matrices with entries in 
non-commutative domains; see also [24, Chap III, Sec. 11]. 
Its refinement nowadays called Jacobson form [20, Th. 3 
& 4] implies equivalence to a single scalar equation. This 
approach was further explored in [16, §6] and [12, 3]. 

Cope [10, §6] rediscovered Schlesinger's CVM, and addi- 
tionally showed that for a system of n equations over K(X), 
one can always choose a polynomial cyclic vector of degree 
less than n in X. A generalisation to arbitrary differential 
fields was given in [8, §7]. The subject of differential cyclic 
vectors gained a renewed interest in the 1970's, starting from 
Deligne's non-effective proof [14, Ch. II, §1]. An effective 
version, with no restriction on the characteristic, was given 
in [8, §3]. CVM has bad reputation, its output being said to 
have very complicated coefficients [19, 5, 2]. However, ad- 
hoc examples apart, very few degree bounds and complexity 
analyses are available [9, §2] . Most complexity results we are 
aware of [8, 17] only measure operations in K(X), and do 
not take degrees in X into account. 

Barkatou [5] proposed an alternative uncoupling method 
reminiscent of Danilevski's algorithm for computing weak 
Frobenius forms [13]. Danilevski's algorithm was also gen- 
eralised by Ziircher [32]; see also [6, §5]. This is what we 
call DBZ, for Danilevski, Barkatou, and Ziircher. Various 
uncoupling algorithms are described and analysed in [17, 
23], including the already mentioned algorithm from [1]. 

1.4 Notation and Conventions 

Algebraic Structures and Complexity. Let K denote 
a field of characteristic zero, Kd[X] the set of polynomials 
of degree at most d, and M n (S) and Mi,„(S), respectively, 
the sets of square matrices of dimension n x n and of row 
vectors of dimension n, each with coefficients in some set S. 
The arithmetic size of a matrix in A4 n (K(X)) is the number 
of elements of K in its dense representation. 

We consider the arithmetic complexity of algorithms, that 
is, the number of operations they require in the base field K. 
For asymptotic complexity, we employ the classical notation 
O(-), £}(•), and &(■), as well as the notation g = O(f) if there 
exists k such that g/f = 0(\og k n). We denote by M(d) 
(resp. MM(n,d)) the arithmetic complexity of the multipli- 
cation in KdLY] (resp. in A1n(lKd[X])). When the complex- 
ity of an algorithm is expressed in terms of M and MM, it 
means that the algorithm can use multiplication (in K[A] 
and A1n(K[X])) as "black boxes". Arithmetic complexities 
are summarised in the following table, where 8 is a constant 



between 2 and 3 that depends on the matrix-multiplication 
algorithm used. For instance, 8 = 3 for the schoolbook algo- 
rithm, and 8 = log 2 (7) w 2.807 for Strassen's algorithm [29]. 
The current tightest upper bound is 8 < 2.3727 [30], follow- 
ing work of Coppersmith- Winograd [11], and Stothers [28]. 
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Generic Matrices. The notion of genericity is useful to 
analyse algorithms on inputs that are not "particular". For 
parameters in some K7", a property is classically said to be 
generic if it holds out of the zero set of a non-zero r-variate 
polynomial. To define the notion of generic matrices, we 
identify M £ A4 n (K d [X]) with the family {m-i,j,fc}, indexed 
by 1 < i,j < n, < k < d, of its coefficients in K. 

Conventions. In this text, M is always the input of the 
uncoupling algorithms. It is assumed to be a matrix in 
q(X)- 1 M„(K d [X]) with q(X) G K d [X]. It defines 6 on a 
matrix or a row vector u by 5(u) = uM + v! . 

For a matrix A, we respectively denote its ith row and 
jth column by A i>t and A*j. We write VJoin(r' 1 - ) , . . . , r- n ') 
for the matrix A such that for all i, Aj,, = r^\ We define 
the rows e; by /„ = VJoin(ei, . . . ,e„). 

A square matrix C is said to be companion if beside zero 
coefficients, it has Is on its upper diagonal and arbitrary 
coefficients on its last row, c = (co, . . . , c n _i). Thus: 

C = VJoin(e 2 , . . . , e„_i, c). (1) 

We say A has its ith row in companion shape if Ai.*, = e<+i. 

We write diag(_B' x ', . . . , B^) for a diagonal block matrix 
given by square blocks B^\ 

Degrees of rational functions and matrices. 

In the present paper, the degree of a rational function is 
the maximum of the degrees of its numerator and denom- 
inator. The degree of a vector or a matrix with rational- 
function coefficients is the maximum of the degrees of its 
coefficients. The following lemma expresses the generic de- 
grees encountered when solving a generic matrix. 

Lemma 1 Let A be a matrix in A4„(K[X]). Define Oi as 
deg( J 4i i ») and D as y^. a,j. Then, deg(det(A)) < D and, 
for all i, deg(det( J 4)(A~ 1 )», l ) < D - a*. When A is generic 
with deg(j4i,*) = Oi for all i, those bounds are reached. 

Proof. Proofs use classical techniques and are omitted. 
We simply observe that diag(x ai , . . . , x an ) N reaches the an- 
nounced bounds when N € A4 n (K\{0}) and det N ^ 0. □ 

1.5 Companion matrices and uncoupling 

For an invertible matrix P, let us perform the change of 
unknowns Z = PY in a system Y' = MY. Then, Z' = 
PY' + P'Y = (PM + P')P- 1 Z. The system is therefore 
equivalent to Z' = P[M]Z where P[M] denotes (P M + 
P')P~ , in the sense that the solutions of both systems, 
whether meromorphic or rational, are in bijection under P. 

We call gauge transformation of a matrix A by an invert- 
ible matrix P the matrix P[A\ = (P A+P')p-\ When B = 
P[v4], we say that A and B are gauge- similar. The gauge- 
similarity relation is transitive since P[Q[A]] = (PQ)[A]. 
With the notation introduced above, P[M] = S(P) P _1 . 



The following folklore theorem relates the solutions of a 
system with the solutions of the uncoupled equations ob- 
tained from a suitable gauge-similar system: it states that, 
to uncouple the system Y' = MY, it suffices to find an in- 
vertible matrix P such that P[M] is in diagonal companion 
block form. This is the main motivation for uncoupling. We 
omit its proof, as it has similarity with the proof of Corol- 
lary 5, and because we use no consequence of it later in this 
article. (We write df instead of /' for derivations.) 

Theorem 2 Let P be an invertible matrix such that 

P[Af]=diag(C (1) ,...,C (t) ) (2) 

with companion of dimension ki. Denote the last row 
of C (i) by (cg\ . . . , • Then dY = MY if and only if 

PY = VJoin(Z (1) ,...,Z w ) 

where Z {l) = (z&, dz^, . . . , d k *- 1 z®) T and 

d K z (i) = c ^_ id ki-i z d) + ... + C W Z ('>. 

2. CYCLIC- VECTOR METHOD 

Two versions of CVM are available, depending on how 
the first row of P is obtained: a version ProbCV picks this 
first row at random, and thus potentially fails, but with 
tiny probability; a deterministic version DetCV computes a 
first row in such a way that the subsequent process provably 
cannot fail. In both cases, CVM produces no non-trivial di- 
agonal companion block decomposition but only one block. 

We present the randomised CVM only, before analysing 
the degree of its output and giving a fast variant. 

2.1 Structure theorems 

Let A k (u) denote the matrix VJoin(u, 8(u), . . . , S k ~ 1 (u)) 
of dimension k x n. The diagonal companion block decom- 
position (2) is based on the following folklore theorem. 

Theorem 3 Let P be an invertible matrix, then there exists 
a companion matrix C of dimension k such that 

P[M] = ( c t ° t ) (3) 

if and only if there exists a vector u such that P = ( ) 
and 8 k (u) G Vect(u, 8{u), . . . ,8 k ~ 1 (u)). 

PROOF. Set (g) := P where U has k rows. Equality (3) 
is equivalent to 8 ( g) = ( ° °) ( g), then with S(U) = CU. 
This can be rewritten: 

VJoin(«5(f7 1 ,»), . . . , S(U kl ,)) = VJoin(C/ 2 ,», . . . , U k ,„C h ,.U). 

Set u to the first row of U. This equation is satisfied if and 
only if U = A k (u) and 8 k (u) G Vect(A fe (V)). □ 

The following corollaries for partial companion decom- 
position and diagonal companion block decomposition are 
proved in a very similar fashion to the preceding theorem. 
They will be used for the analysis of CVM and DBZ. 

Corollary 4 Let P be an invertible matrix, then P[M] has 
its first k — 1 rows in companion shape if and only if there 
exists a row vector u such that P= ( **(«)"). 



Corollary 5 Let P be an invertible matrix and {C }i<j<t 
a family of companion matrices of dimension ki, then 

P[M] = diag(C (1) ,...,C* (t) ) 

if and only if there exist t row vectors {tt }i<i<t such that 
P = VJoin(A ftl ( M (1) ),...,A' ct (u (t) )) and for alii, 8 k *(u {i) ) 
is in Vect(A fe *(w w )). 

2.2 Classical algorithms for cyclic vectors 

The name CVM comes from the following notion. 

Definition A cyclic vector is a row vector u G K(X) n for 
which the matrix A n (u) is invertible, or, equivalently, such 
that the cyclic module generated by u over "K(X){&) is the 
full vector space M\, n (K(X)) of row vectors. 

The next folklore method [5, 8] is justified by Theorem 6 
below, which means that CVTrial will not fail too often. 



Algorithm CVTrial: Testing if a vector is a cyclic vector 

Input: M G q(X)- 1 M n (K d [X]) and u G Mi,„(K„_i[I]) 
Output: P, C with C companion and 8(P)P~ 1 = C 

1: set P to the square zero matrix of dimension n 

2: Pi,, := it 

3: for i = I to n — 1, do -P;+i,» := 6 (Pi,*) 
4: if P is not invertible, return Failure 
5: C:=8(P)P- 1 
6: return (P, C) 



Theorem 6 [10, 8] When u is generic of degree less than n, 
the matrix P = A n (u) is invertible. 

Proof. It is proved in [10, 8] that every matrix M admits 
a cyclic vector u of degree less than n. Then, det(-) is a non- 
zero polynomial function of the matrix coefficients. □ 

In ProbCV, u is chosen randomly, leading to a Las Vegas 
algorithm for finding a cyclic vector. The proof above refers 
to the theorem that every matrix M admits a cyclic vector. 
Churchill and Kovacic give in [8] a good survey of this sub- 
ject. They also provide an algorithm that we denote DetCV 
that takes as input a square matrix M and deterministically 
outputs a cyclic vector u. The arithmetic complexity of this 
algorithm is polynomial, but worse than that of ProbCV. 

2.3 Degree analysis and fast algorithm 

CVTrial computes two matrices, P and C, whose sizes we 
now analyse. We shall find the common bound 0(n 3 d). 
When u G Mi, n (K n -i[X]) is generic, this bound is reached. 
The size 0(n 3 d) of the output of CVTrial is then a lower 
bound on the complexity of any algorithm specifying CVM. 
After the remark that the simple algorithm is above this 
bound, we give a fast algorithm. 

We start by bounding the degree of the matrix A n (u). 
Following the result of Churchill and Kovacic, we make the 
assumption that deg(u) is less than n. 

Theorem 7 The row vector q k 8 k (u) consists of polynomials 
of degree at most deg(u) + kd. 

Proof. The proof proceeds by induction after noting that 
q k+i s k+i = q ( 5q k _ kq k-i q ^ s k = ( q6 _ kq >) q k 5 k n 



We list further bounds on degrees and arithmetic sizes, 
some of which are already in [9, §2] : 

Size 

0(n 3 d + n 2 deg(u)) 
0{n 4 d + n 3 deg(u)) 
0(n 2 d + ndeg(u)) 
0{n 3 d + n 2 deg(u)) 





Degree 
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deg(u) + (n - 


- \)d 


p-1 


ndeg(it) + — 


n-l) 
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deg(it) + nd 
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ndeg{u) + — 
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Theorem 8 There is an algorithm of quasi- optimal com- 
plexity O(n 0+1 d) implementing the cyclic-vector method. 

Proof. At Step 3, CVTrial computes 5(Pi,*) = P»,*M + 
P/, for successive i's. The complexities of addition and 
derivation are linear, so we focus on the product Pi,*M. 
Computing it by a vector-matrix product would have com- 
plexity Q.(n 2 id) and the complexity of the loop at Step 3 
would then be Q.(n 4 d). The row Pj,* has higher degree 
than M, so the classical idea to make it a matrix to balance 
the product applies. Let A k G K<j[X] n be the rows defined 
by Pi,* = YZLlA k X kd , and A := VJoin(A , . . . , A n -x). 
The product AM is computed in complexity 0(MM(n, d)), 
and Pi,*M is reconstructed in linear complexity, thus per- 
forming the whole loop in complexity 0(n MM(n, d)). 

Only the last row 8 n (u)P~ 1 of C needs to be computed 
at Step 5, and the size Q(n 4 d) of P _1 bans the computa- 
tion of P _1 from any low complexity algorithm. C can be 
computed in complexity (5(MM(n, nd)). This is achieved by 
solving PY = S n (u) by Storjohann's algorithm [27], which 
inputs S"(u) and P, of degree O(nd), and outputs the last 
row 5 n (u) P" 1 of CinO(MM(n,nd)log(nd)) operations. □ 

3. THE DANILEVSKI-BARKATOU- 
ZURCHER ALGORITHM 

We begin this section with a description of algorithm DBZ, 
before a naive analysis that gives exponential bounds. Ex- 
periments show a polynomial practical complexity, whence 
the need for a finer analysis. To obtain it, we develop an 
algebraic interpretation of the algorithm. 

3.1 Description of the algorithm 

The input to DBZ is a matrix M £ M n (K(X)); its output 
(P, C (1) , . . . , C (t) ) satisfies P[M] = diag(C* (1) , . . . , C (t) ) for 
companion matrices C {i) . DBZ iterates over P[JVf] to make 
it progressively diagonal block-companion. To do so, three 
sub-algorithms are used, DBZ 1 , DBZ 11 , and DBZ 111 , m or- 
der to achieve special intermediate forms for P[M]. These 
forms, respectively Shape (I), (II), and (III), are: 



(a /»)' 
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where in each case C denotes a companion matrix, v is a 
column vector, a, j3 are general matrices, and j3 is square. 

In the course of DBZ, first, DBZ 1 computes P 1 and M 1 
such that P ! [M] = M 1 has Shape (I). If M 1 = C is com- 
panion — that is, if q and j3 do not occur — then DBZ 
returns (P I ,M I ). If not, at this point, DBZ has obtained a 
first companion block C and we would hope that a is zero 
to apply DBZ recursively to f3. So, in general, DBZ tries to 
cancel a, by appealing to DBZ 11 to compute P n and M u 



such that P ll [M l \ = M u has Shape (II). If the obtained v 
is zero, then DBZ can go recursively to /3. 

If not, DBZ seems to have failed and restarts on a matrix 
P m [M n ] = M m with Shape (III), which ensures DBZ 1 can 
treat at least one more row than previously (as proved in [6]). 
Therefore, the algorithm does not loop forever. The ma- 
trix M on which DBZ starts over and the differential 
change of basis P 111 associated are computed by DBZ 111 . 

Algorithm DBZ (Danilevski-Barkatou-Zurcher) 
Input: M e M n (K(X)) 

Output: (P, C (1) , . . . , C (t) ) with C (l) companion matrices 
and P[M] = diag(C (1) , . . . , C (t) ) 
1: {P l ,M l ) ~ DBZ : (M) 
2: if M l is companion then return(P : , M l ) 
3: (P n ,M n ) := DBZ II (M I ) 

( v ° <}) := M u where v € Mn-k,i(&(X)) 
if v = then 

(P,C (1) ,...,C* (t) ) := DBZ(/3) 
return (diag(/ fc , P)P n P : , C, C (1) , . . . , C (t) ) 
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(P,C (1) ,...,C (t) ) := DBZ(M m ) 
return (PP m P n P\C W , . . . ,C {t) ) 



3.2 Description of the sub-algorithms 

We now describe DBZ 1 , DBZ 11 , and DBZ 111 in more details. 
By Ei t j(t), we denote the matrix obtained after replacing 
by t the (i,j) coefficient in the identity matrix 7„, and by 
Ei (u) the matrix obtained after replacing the ith row by the 
row vector it in /„. Let lnv (j,n - ) denote the matrix obtained 
from I n after exchanging the jth and nth rows. Set Rot = 
VJoin(e n , ei, . . . , e n _i). 

The algorithms developed below rely on a common Pivot 
subtask, which inputs (M,P,T) £ M n (K(X)) 3 with invert- 
ible P and T, and outputs the update of (M, P) under T. 
This really behaves like a Gauge transformation, chang- 
ing (M,P) to (T\M],TP). This modification of M and P 
only ensures the invariant M = P[Mi n ; t iai]. 

DBZ 1 inputs M and outputs the tuple of matrices (P 1 , M 1 ) 
with M l in Shape (I). It starts with M l = M and modifies 
its rows one by one. At the ith iteration of the loop (Step 2), 
the matrix M l has its first i — 1 rows in companion form. 
To put the ith row in companion form, DBZ 1 sets M] +l i+2 
to 1 (Step 6), and uses it as a pivot to cancel the other 
coefficients of the row (loop at Step 7) . 

Algorithm DBZ 1 
Input: M £ M n (K(X)) 

Output: (P I ,AP T ) with M 1 in Shape (I) and P l [M\ = M 1 
1: (P l ,M l ) ~ (I,M) 
for i = 1 to n — 1 do 

r := min({j | M? i and j > i} U {n + 1}) 
if r = n + 1 then return (P 1 , M l ) 



(i+l,r)\ 



(M\P L ) ~ Pivot(M\P\lnv 

{M\P l ) := Pivot(M I ,P I ,P i+1 , l+1 (M l I il+1 )- ] 
for j = 1 to n with j ^ i + 1 do 

(A/ 1 , P 1 ) := Pivot(M I ,P I ,P l+1 , J (-M i [ J )) 
return (P : ,M : ) 



If M] +l i+2 = and there is a non-zero coefficient farther 
on the row, then the corresponding columns are inverted at 
Step 5 and DBZ 1 goes on. If there is no such coefficient, the 
matrix has reached Shape (I) and returns at Step 4. 

DBZ 11 inputs M l and outputs a tuple (P U ,M U ) with 
M n in Shape (II). At Step 3, it cancels the columns of the 
lower-left block of M l one by one, from the last one to the 
second one, using the f's of C as pivots. At the Ith iteration, 
the lower-left block of M 1 ends with £ zero columns. 



Algorithm DBZ 
Input: M 1 in Shape (I) 

Output: (P n ,M n ) in Shape (II) such that P^M 1 ] = AP 
1: k := size of the companion block of M l 
2: (P n ,M n ) := (I, M 1 ) 
3: for j = k down to 2 do 
4: for i = k + I to n do 

5: (M n ,P n ) := Pivot(M n , P u , P 4 j_i (— M?')) 

6: return (P n ,M n ) 



DBZ inputs M with v non-zero and outputs the tuple 
(P nI ,M m ) with M IU in Shape (III). The transformation 
of M at Step 3 reverses v to put a non-zero coefficient on 
the last row of M u . Then it sets it to I at Step 4 and 
uses it as a pivot to cancel the other Vj's (Step 6). Finally, 
at Step 7, a cyclic permutation is applied to the rows and 
columns: last row becomes first, last column becomes first. 



Algorithm DBZ 

Input: M u in Shape (II), under the constraints C has di- 
mension k and v ^ 

Output: (P m , M m ) in Shape (III) where P m [M n ] = M™ 
1: Af m := M u 
2: h := max{i | M™ ^ 0} 
3: (M m ,P m ) := p'ivot(M m , P m , Inv^) 
4: (M m ,P m ) := Pivot(M m , P m , P n ,„(I/M' n 1 )) 
5: for i = fc + Iton — I do 

6: (M m ,P ln ) := Pivot(M m , P m , Ei, n (-M%)) 
7: (M m ,P m ) := Pivot(Af m , P m , Rot) ' 
8: return (P m ,M m ) 



3.3 A naive degree analysis of the generic case 

When M is generic, DBZ 1 outputs a companion matrix, so 
DBZ terminates at Step 2 in the generic case with only one 
companion matrix in the diagonal companion block decom- 
position. The proof of this fact will be given in Section 3.4. 
Therefore, the complexity of DBZ 1 is interesting in itself. 

A lower bound on this complexity is the degree of its out- 
put. We explain here why a naive analysis of DBZ 1 only 
gives an exponential upper bound on this degree. 

Let M' 1 ' be the value of M 1 just before the ith iteration of 
the loop at Step 2 (in particular, M (1) = M), and M (n) the 
output value. Remark that the matrices involved in the 
gauge transformations at Steps 6 and 8 commute with one 
another. Their product is equal to P i+ i(lnv (4+1 ' r) [M (l) ], i »). 

Lemma 9 If A £ A4 n (Kd[X]) is a generic matrix with its 
first i — 1 rows in companion form and T = Pi+i ( Ai^ ) , then 
T[A] has degree 3d. 



An exponential bound on the output of DBZ 1 is easily 
deduced: deg(M (n) ) < 3 n_1 deg(M). We will dramatically 
improve this bound in the following section. 

3.4 Algebraic interpretation and better bounds 

To prove the announced tight bound, it could in prin- 
ciple be possible to follow the same pattern as in Bareiss' 
method [4] : give an explicit form for the coefficients of the 
transformed matrices M, from which the degree analysis 
becomes obvious. But it proves more fruitful to find a link 
between CVM and DBZ, and our approach involves almost 
no computation. 

Algorithm DBZ reshapes the input matrix by successive 
elementary gauge transformations. It completely relies on 
the shape of M, M 1 , M u , and A/ 111 , while the construc- 
tion of the matrices P is only a side-effect. As illustrated 
in Section 3.3, this approach is not well suited for degree 
analysis. 

In this section, we focus on P, P , P , and P . It turns 
out that these matrices allow nice algebraic formulations, 
leading to sharp degree and complexity analyses of DBZ. 

The following lemma, whose omitted proof is immediate 
from the design of DBZ, provides the complexity of the com- 
putation of an elementary gauge transformation. 

Lemma 10 If t G K d {X] and M G M n (K d {X]), then the 
gauge transformation of M by Ei,j(t) can be computed in 
0(n M(d)) = <D(nd) operations in K. 

3.4.1 Analysis of DBZ 1 

We consider an execution of Algorithm DBZ 1 on a ma- 
trix M of denominator q, where deg(g) and deg(qM) are 
equal to d. Let P' ! ' and M^' be the values of the matri- 
ces P 1 and M l when entering the ith iteration of the loop at 
Step 2, and P (n) and M {n) the values they have at Step 9 if 
this step is reached. Set k to either the last value of i before 
returning at Step 4 or n if Step 9 is reached. Consequently, 
the companion block C of M l has dimension k. The use of 
Algorithm Pivot ensures the invariant Af' 1 ' = P^'[M] for 
all i < k. 

The following lemma gives the shape of the matrices P^*' . 

Lemma 11 For each i, P (i) = VJoin(A^(ei), Q {i) ) for some 
whose rows are in {e2, - . . , e n }. 

PROOF. The first i—1 rows of P^'[M] have companion 
shape by design of DBZ 1 . So, by Corollary 4, there exist a 
vector u and a matrix Q (i) with P w = VJoin^A^u), Q w ). 

We now prove by induction that P[ ^ = e\ and that for 
all a > i, G {e 2 , . . . , e„}. First for i = I, P (1) = /, so 
the property holds. Now, we assume the property for P' 1 ' 
and consider the ith iteration of the loop at Step 2. For j ^ 
i + I, let T''-'' denote the value of the matrix involved in the 
gauge transformation at Step 8 during the jth iteration of 
the loop at Step 7, and let T^ l+r> denote the matrix used at 
Step 6. Let r denote the integer defined at Step 3. 

The transformations of P at Steps 5, 6, and 8 imply 

p(i+l) — rp(n) . . . j,(i+2)rp(i) _ _ _ rp(l)rp(i+l)] m (i+l,r) p(i) rp^g 

first row of each matrix T&) and each matrix lnv < - 8+1,r ' is ei, 
so Pi * ^ = Pi » which is, by induction, equal to e%. 

For each integer a > i + I and each j, by definition T^ l = 
e a . Therefore, Pi*+ 1} = lnv^± 1,r) P w . Moreover, if a = r, 



then InVa^ 1 '*^ is equal to ej+i; if not, it is equal to e a . In 
both cases, by induction, Inv^+^pW G {e 2 , e„}. □ 

We are now able to give precise bounds on the degree of 
the output and the complexity of DBZ 1 , using lemma 1. 

Theorem 12 Let k be the dimension of the companion block 
output from DBZ 1 . The degree of q k ~ l P tk) is 0{kd) and the 
degree of (q k{ - k+1)/2 det(P {k) ))M {k) is 0(k 2 d). DBZ 1 has 
complexity 0(n 2 kU(k 2 d)) = 6(n 2 k 3 d). 

It is possible to give more precise bounds on the degrees 
and even to prove that they are reached in the generic case. 

Proof. By Lemmas 11 and 7, the degrees of the rows of 
diag(l, q, . . . , q'" 1 , 1, . . . , 1)P^' are upper bounded by (0, d, 
. . . , (i — l)d, 0, . . . , 0). Now Lemma 1 implies that the de- 
gree of g' (l - 1)/2 det(P (l) )P (l) " 1 is 0(i 2 d). By the invariant 
of Pivot and P[M] = <5(P) P"\ M (i) = 5(P (<) )P (i) ~\ we 
deduce that the 1cm of the denominators in M^ 1 ' divides 
Li := g l(i+1)/2 det(P (i) ) and that deg(LiM (l) ) is in 0{i 2 d). 
The degrees of the theorem follow for i = k. 

The computation of M^ !+1 ' from uses n elemen- 

tary gauge transformations on leading by Lemma 10 

to a complexity 0(n 2 M(i 2 d)). The announced complexity 
for DBZ 1 is obtained upon summation over i from 1 to k. □ 

The output matrix P 1 = P' fc ' is invertible, so i < k implies 
6*(ei) ^ Vect(A l (ei)). Therefore, k is characterised as the 
least i£N \ {0} such that <5*(ei) G Vect(A i (ei)). 

Informally, for random M, the S l (ei) are random, so most 
probably k is n. Indeed, when we experiment DBZ on ran- 
dom matrices, it always computes only one call to DBZ 1 and 
outputs a single companion matrix. We make this rigourous. 

Theorem 13 When M is generic, then DBZ has the same 
output as CVM with initial vector ei . 

Proof. For indeterminates qt and rriij^, let M be the 
n x n matrix whose (i, j)-coefficient rhij/q has numerator 
rhij = J2t=o rhi,j,kX k and denominator q = J2k=o^kX k . 
Replacing M by M formally in det(A n (ei)), we obtain a 
polynomial in the qt's and the Aij^'s. This polynomial is 
non-zero since for M = VJoin(e2, . . . , e n , ei), A n (ei) = /. 
This proves that when M is generic, ei is a cyclic vector 
for M, so Shape (I) is reached with empty a and /?, and 
DBZ behaves as DBZ 1 and as CVM with initial vector e\. □ 

3.4.2 Analysis of DBZ 11 

As for DBZ 1 , a naive analysis would lead to the conclu- 
sion that the degrees in P 11 increase exponentially during 
execution of DBZ 11 . We give an algebraic interpretation 
of P 11 that permits a tighter degree and complexity anal- 
ysis of DBZ 11 . 

In this section, we consider the computation of DBZ 11 on 
a matrix M 1 in Shape (I) whose block C has dimension k. 
Let q\ denote the denominator of M , and d\ be a common 
bound on deg(qi) and deg(giM : ). 

Let F (or r^) denote the operator on vectors or matrices 
with n — k rows defined by T(v) = f3v — v' . Observe that, 
as in Lemma 7, deg(q k T k (v)) is bounded by deg(u) + kdj. 

The loop at Step 3 processes the j's in decreasing order. 
Let P U) and M U) be the values of the matrices P n and M u 
just before executing the loop at Step 4 in DBZ 11 . 



Lemma 14 For each j, the matrix P^' has the shape 

where is a matrix of dimension (n— k) xk. Furthermore, 
for all a < j and for a = k, A* a = and for j < a < k, 



(5) 



Proof. The matrix P (fc) is the identity, owing to Step 2; 
for k < j, P^' is equal to the product of all the matri- 
ces T previously introduced for the gauge transformations 
at Step 5 for greater values of j. Each of those matrices 
has a block decomposition of the form ( Jj J). Therefore, 
their product P^ has shape (4), where A 1 -^ is the sum of 
the blocks P's. Whether j = k or j < k, for a < j and 
for a = k, and for each T, P* ja = 0; therefore, A^ a = 0. 

Since P (j) = ( J ri °), its inverse is ( °) 



M U) = p«[ M I] = 



c 







A u) C + a-r(A u) ) 13 



(6) 



By the design of DBZ 11 , A^C + a - T(A (j) ) ends with 
k — j zero columns. We consider the (a + l)th column of (6) 
and use the fact that Aj k = 0, to obtain (5). □ 

This leads to the degree and complexity analysis of DBZ 11 . 

Theorem 15 Both deg(q k ~ 1 P n ) and deg(q k M ir ) are in 
0(kdi). The complexity of DBZ 11 is 0{(n - fc) 2 fc 2 M(d/)) = 
6{{n- k) 2 k 2 di). 

Proof. The degree of P"' is equal to the degree of ^4"'. 
Equation (5) implies, after using Al k l = 0, that for each a, 



A 



U) 



Therefore, deg(q k ~ j+1 A U) ) is 0((k - j)di). From Equa- 
tion (6), it follows that the degree of q k - J+2 M u) is also 
0((k-j)di). For j = 2, we conclude that both deg(g I fc_1 P 11 ) 
and deg(q?M n ) are 0(kdi). 

The computation of M u+1) from M u) by the loop at 
Step 4 involves n — k elementary gauge transformations. 
Each one computes n — k (unbalanced) multiplications of 
elements of /3 with elements of The cost is then 

0((n-k) 2 (k- j)M(di)). We obtain the complexity of DBZ 11 
by summation over j from 2 to k. □ 

3.4.3 Analysis of DBZ 111 and DBZ 

Let (P, C* (1) , . . . , C (t) ) be the output of DBZ on M. Corol- 
lary 5 states that P = VJoin(A fel (u w ), ...,A kt (u (t) )). The 
degrees of the matrices transformed by DBZ, and thus its 
complexity, are obviously linked to the degrees of the vec- 
tors u^'. Focusing the analysis on the degree of will 
result in the exponential degree bound C(n° (n) d), which 
we believe is not pessimistic. In turn, this seems to be a 
lower bound on the complexity of DBZ. 

Conjecture The complexity of DBZ is more than exponen- 
tial in the worst case. 



We shall show that this explosion originates in the recur- 
sive calls at Step 10. Unfortunately, we have been unable to 
exhibit a matrix M leading to an execution with more than 
one recursive call, such cases being very degenerate. 

We now drop the exponent and write u for w' 1 '. As u can 
only be modified at Step 10, we consider the initial flow of 
an execution, as long as the M l, s are not companion and 
the v's are non-zero; this excludes any return at Step 2 or 7. 

Set P (I ' r) , M (I ' r) , p( u - r \ and P^ IU ' r) to the values of P 1 , 
M 1 , P n , and P m just before the rth call at Step 10. The 
matrix M' 1 ''"' has Shape (I) and is by construction gauge- 
similar to M: for some invertible P (r) , P (r) [M] = M (I ' r) , 
and, by Theorem 3, there exist u^'\ Q , and k r such that 

P (r) = VJoin(A^(u (r) ),Q (r) ). 

This leads to a new interpretation of DBZ: it tests several 
vectors v!- r ^ and iterates 5 on them to construct the matri- 
ces P^ r \ until P^[M] is companion (Step 2) or allows a 
block decomposition (Step 7). 

Theorem 16 Write P {ILr) by blocks as ( J r) °). There 
exist an integer h and a rational function w such that 

u (r+1) = w(A (r) A fc "(u M ) + Q (r) ) h ,». (7) 

PROOF. By definition, u (r+1) is the first row of p( r+1 ). 
Step 10 sets P (r+1) = p(i>*-+i)p(in,r)p(n J r) j p(r)_ Lemma 11 
implies P^ l 'J +1 ^ = ei; in addition, by Lemma 14, p( n ' r > has 
a block decomposition as in the theorem statement. So, 
u (r+i) = ppn» (j r) o) ( A *^» W >). The proof is now 

reduced to the existence of h > k r such that P[^' r ' = e^. 

In Algorithm DBZ 111 , the matrices involved in the gauge 
transformations at Step 6 commute with one another. Let S 
denote their product. Set h to the integer defined at Step 2, 
then h > k r and p( ln ' r ) = Rot S lnv (h ' n) . By construction, 
Roti. = we n for a certain rational function w 

defined at Step 4, and lnv^ h ; n ' = e^. This ends the proof. □ 

We now express the growth of deg(it' r - ) ) with respect to r. 
Let di t r denote the degrees of the numerators and denomi- 
nators of P (r) [M], so in particular a bound for u (r \ Now, 
Theorem 15 implies that the degree of the numerators and 
denominators of A (r) are 0{k r d hr ) = 0(kfd + k 2 r deg(u (r) )). 

The rational function w of the theorem is the inverse of 
an element of M' 11 ' 1 "', so the degree of its numerator and de- 
nominator are 0(kf,d + k 2 deg(u < - r ')). Combined with The- 
orem 16, this implies deg(« (r+1) ) = 0(kf,d + k 2 r deg(u M )). 

We could not deduce from Theorem 16 any polynomial 
bound on the degree of the numerator of i/ r \ but we get 
deg(w (r) ) = O(rdn 2r+3 deg(u {0) )). The worst case of this 
bound is obtained when r = n — 1. 

3.5 Link with the Abramov-Zima algorithm 

In [1], Abramov and Zima presented an algorithm, de- 
noted by AZ in the following, that computes the solutions 
of inhomogeneous linear systems Y' = MY + R in a general 
Ore polynomial ring setting. It starts by a partial uncou- 
pling to obtain a differential equation that cancels Y\ , solves 
it and injects the solutions in the initial system. We reinter- 
pret here its computations of a partial uncoupling, focusing 
on the case of systems Y' = MY where M is a polynomial 
matrix, and we analyse the complexity in the generic case. 
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Table 1: Experimental complexity of DBZ 1 , CVM, 
and their sub-algorithms 



Step 1. Introduce a new vector Z of dimension £ < n 
(generically with equality), such that Z\ = Y\ and, for i > 1, 
Zi is a linear combination of Yi, . . . , Y n , such that where /3 
is a lower-triangular matrix augmented by Is on its upper- 
diagonal: /3i,;+i = 1 for all 1 < i < I — 1. 

Step 2. Eliminate the variables Z2, ■ ■ ■ , Zi by linear combi- 
nations on the system obtained in Step 1 to get a differential 
equation of order £ that cancels Yi . 

Theorem 17 Let M be a generic matrix of dimension n 
with polynomial coefficients of degree d, then the complexity 
of AZ to uncouple the system Y' = MY is 0(n 5 d). 

Proof. When M is generic, the minimal monic differ- 
ential equation that cancels Y\ has order n and its coeffi- 
cients of orders to n — 1 are the coefficients of the vec- 
tor 8 n (e 1 )p- 1 , where we have set P = A n (ei). Thus, for 
generic M, the integer I defined in Step 1 is equal to n. 

Step 1 implies that there is an upper-triangular matrix U 
such that Ui = ei, Z = UY and U\M] = /3. At Step 2, 
the eliminations of the variables (Zi)i<i<n are carried out 
by pivot operations. They transform the system Z' = fiZ 
into a new system W' = CW with W\ = Z\ = Yi and C 
is a companion matrix, which is equal to P[M]. Because 
of the particular shape, the matrices matching those pivots 
operations are lower-triangular with Is on their diagonal. 
Their product is a matrix L such that W = LZ\ it is also 
lower-triangular with Is on its diagonal. Since P[M] = C = 
L[/3] and f3 = U[M], P = LU. 

By construction, the degree of P is 0(nd). The matri- 
ces L, U, and L~ of its LU decomposition have degrees 
0(n 2 d) [4]. Thus, the degree of /3 = U[M] = L-^PfM]] 
is 0(n 2 d). Steps 1 and 2 of AZ compute 0(n 2 ) pivot op- 
erations, each one involving 0(n) manipulations (additions 
and products) of polynomial coefficients of degree 0(n 2 d). 
This leads to the announced complexity for AZ. □ 

It can be proved that the product L ■ U in this proof is the 
LU-decomposition of P; for a non-generic M, [1] implicitly 
obtains an LUP-decomposition. 

The degree bounds in the previous proof are reached in our 
experiments: the maximal degrees of the numerator of the 
matrices /? computed for random matrices M of dimension n 
from 1 to 6 with polynomial coefficients of degree 1 are, 
respectively, 1, 2, 5, 10, 17, and 26. 

4. IMPLEMENTATION 

We have implemented the DBZ algorithm and several vari- 
ants of the CVM algorithm to evaluate the practical effi- 
ciency of our algorithmic improvements. Because of its fast 
implementations of polynomial and matrix multiplications, 
we chose the system Magma, using its release V2.16-7 on 
Intel Xeon 5160 processors (3 GHz) and 8 GB of RAM. 
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Figure 1: Timings for DBZ and CVM on input matri- 
ces M of dimension n and coefficients with fixed de- 
gree d = 15 (smaller marks) or d = 20 (larger marks) 



Our results are summarised in Table 1 and Figure 1. We 
fed our algorithms with matrices of dimension n and coef- 
ficients of degree d. Linear regression on the logarithmic 
rescaling of the data was used to obtain parameters c, e, 
and p that express the practical complexity of the algorithms 
in the form cd e n p . For the exponents p, theoretical values 
are given to compare with the experimental values. Sample 
timings for particular (n,d) are also given. BalConstr and 
NaiveConstr (resp. StorjohannSolve and NaiveSolve) compute 
the matrix P (resp. C) of Algorithm 1 with or without the 
algorithmic improvements introduced in Theorem 8. 

In Figure 1, the benchmarks of each algorithm for d = 15 
and d = 20 form two parallel straight lines, as was expected 
on a logarithmic scale. The improved algorithms are more 
efficient than their simpler counterpart. Algorithm BalCon- 
str is faster than NaiveConstr when n and d are large enough. 
It is also visible that StorjohannSolve is the dominating sub- 
algorithm of CVM. A native implementation in Magma of 
Storjohann's algorithm could well change this situation. 

The exponents e only depend of Magma's native algo- 
rithms and are not concerned by Theorem 8. We observe 
that, with respect to d, DBZ, BalConstr, and Storjohann- 
Solve have slightly better practical complexity than their 
respective couterparts CVM, NaiveConstr, and NaiveSolve. 

The practical exponent p = 3.00 of BalConstr is smaller 
than 8+1. The algorithm consists of n executions of a loop 
that contains a constant number of scans of matrices and 
matrix multiplications. By analysing their contributions to 
the complexity separately, we obtain 2.5 10- 6 d £,7 n 3 00 for 
the former, and 5.2 10" 8 d 134 n 319 for the latter. In the 
range of n we are analysing, the first contribution dominates 
because of its constant, and its exponent 3.00 is the only one 
visible on the experimental complexity of BalConstr. 

Our implementation of Storjohann's algorithm is limited 
by memory. It cannot handle matrices of dimension n or 
coefficient degree d more than 130. This bounds the size 
of the inputs manageable by our CVM implementation. A 
native Magma implementation of Storjohann's algorithm 
should improve the situation. However, our implementation 
already beats the naive matrix inversion, so that the exper- 
imental exponent 3.88 of StorjohannSolve is close to 6 + 1. 

The experimental exponent p of DBZ is 6.01 instead of 5. 
This may be explained by the fact that the matrix coeffi- 
cients that DBZ handles are fractions. Instead, in BalConstr, 
the coefficients are polynomial: denominators are extracted 
at the start of the algorithm and reintroduced at the end. 
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